Analysis on Rental Price, Location and Facilities of LSE Accommodation Halls¶
Table of Contents¶
1. Introduction
2. Data Acquisition
3. Data Preparation
4.1 Exploratory Data Analysis (EDA)
4.2 What is the relationship between the prices of each hall and their distance to campus?
4.3 How do prices vary across different room types in each hall?
4.4 Is the price worth what the hall provides in their facilities and catering system?
4.5 Other possible factors that might affect price
1. Introduction¶
We are General Course students studying abroad at LSE, and being from different countries, most of us choose to live in a LSE accommodation hall during our stay in London. We recognize the need to streamline the accommodation searching process as it can be confusing. Those who have decided to pursue student accommodation instead of finding their own place have made, one, of the many choices to come as there are a plethora of options. The halls are not solely affiliated with LSE and are not located on campus but rather spread all across central London. Each hall is also unique in their location, facilities, catering system, and price; it can be challenging to match a hall to suit one’s needs.
Most new students resort to spreadsheets when trying to make horizontal comparisons between halls or opening millions of tabs until they have confused themselves over and over again. We are hoping to simplify this process so accommodation searching can be more time efficient and better suited for the incoming students. The current “Refine” bar searching system on the LSE Accommodation website (https://www.lse.ac.uk/student-life/accommodation/search-accommodation) gives some insight on the halls, but is not very effective or detailed without opening separate links for each hall.
Our originality lies in deconstructing the original website to extract information and data for every LSE hall, and presenting information to students in a more effective way through visualization. We hope to solve the problems caused by the original design of the website that is not convenient enough, and let our project act as a supplementary tool for students choosing where to live. We will also evaluate each hall's worth and how the price of one might differ from another based on their locations and facilities that are being offered. From the motivation, here are the questions we are looking to explore.
Which is the hall that is best suited for my needs?
- What is the relationship between the prices of each hall and their distance to campus?
- How do prices vary across different room types in each hall?
- Is the price worth what the hall provides in their facilities and catering system?
2. Data Acquisition¶
To tackle our questions, we will need to pull information from the halls’ individual sites off the LSE Accommodation webpage. We can create multiple dataframes that address the prices, room types, facilities, distance and time it takes to reach campus, nearest tube station etc. Here are the following steps we took:
- Extract the url of each hall from the LSE Accommodation search page
We first pulled each hall's hyperlink from the LSE Accommodation search page (https://www.lse.ac.uk/student-life/accommodation/search-accommodation) and sorted it into a list to make scraping from individual sites with more detailed key facts easier.
from bs4 import BeautifulSoup
import requests
all_hyperlinks = []
for page_num in range(1, 3):
url = f"https://www.lse.ac.uk/student-life/accommodation/search-accommodation?collection=lse-accommodation&pageIndex={page_num}&sort=metaavailability"
try:
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
accommodation_titles = soup.find_all('h2', class_='card__title')
hyperlinks = []
for title in accommodation_titles:
hyperlink = title.find('a')['href']
hyperlinks.append(hyperlink)
all_hyperlinks.extend(hyperlinks)
except Exception as e:
print("An error occurred:", e)
print(all_hyperlinks)
['http://www.lse.ac.uk/student-life/accommodation/halls/college-hall/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/international-hall/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/butlers-wharf-residence/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/bankside-house/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/carr-saunders-hall/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/connaught-hall/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/high-holborn-residence/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/urbanest-westminster-bridge/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/lilian-knowles-house/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/passfield-hall/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/nutford-house/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/rosebery-hall/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/the-garden-halls/home.aspx', 'http://www.lse.ac.uk/student-life/accommodation/halls/sidney-webb-house/home.aspx']
- From each hyperlink, scrape information from individual webpages and combine into dataframes
The information on the individual webpages are placed all across the page, so 3 different dataframes are created for different spaces.
Dataframe 1 offers information extracted from the webpage header, which includes accommodation name, address, a price range that gives basic insight and this accommodation’s distance to campus. Due to the causal relationship between distance and travel time, we also put the travel time extracted from the main body paragraphs in this dataframe.
The scraping code finds the h1 element with the class 'heroBanner_title', which contains the title for the accommodation. The scraping code finds the p element with the class 'heroBanner_address', which contains the address for the different accommodation. The scraping code finds the div element with the class 'accommKeyDetails_dist', which contains the distance to campus from each hall. The scraping code finds the p element with the class 'accommKeyDetails_price', which contains the price range of each hall. The scraping code finds the article element with the class 'pageContent accommContent', to calculate the travel time needed to reach campus from each hall.
Dataframe 2 goes deeper into detailed information of each room type in the hall, which are only included in the bottom half of the page. This dataframe contains data on specific room types, private or shared bathroom, size approximation of rooms and contract cost on a weekly basis.
Room Types are stored in 'h2', class_='roomlist__title'. Bathroom Types are stored in 'p', class_="roomlist__position". Descriptions of each room type are stored in 'div', class_='roomataGlance', and we extract prices through class_='roomataGlance__figure'. For size approximation we find all 'p' which contain text like 'Size approx'.
Dataframe 3 focuses on the side bar of each webpage, offering data on total bed spaces and all the facilities this hall provides.
The scraping code first finds the h3 element with the class 'ataGlance__title--types', which contains the title for room types. Then, it locates the ul element with the class 'ataGlance__list', which contains the list of room types. Next, it finds all li elements within this list with the class 'ataGlance__item . It then iterates through each li item and extracts the room type name and quantity. For facilities extraction, it finds the h3 element with the class 'ataGlance__title', which contains the title for facilities. It then locates all ul elements with the class 'ataGlance__list'.info.
The scraping code and detailed steps are provided in Scraping Code and Creating Dataframes.ipynb. Please note that based on different types of scraping results we got, we created sub-dataframes before merging them into the final three dataframes.
3. Data Preparation¶
Before converting the three dataframes into csv files, we did some basic cleaning on the raw dataframes, including filling in missing data, splitting string information into separate columns, uniforming symbols and wording, changing data types etc. The specific changes we made and all the codes are provided in Scraping Code and Creating Dataframes.ipynb.
3.1 Dataframe 1¶
Before forming dataframe 1, we separated Train Stations into 4 columns rather than putting all stations in one column. We modified all stations referring to "King's Cross" into "King's Cross/St Pancras" and made sure they are having the same kind of apostrophe. We also changed the numerical information from object to float type.
import pandas as pd
file = 'data/distance_data.csv'
df1 = pd.read_csv(file)
df1.head()
| Name | Address | Distance to Campus(km) | Price Range(£/week) | On Foot(min) | By Bike(min) | By Public Transport(min) | Station 1 | Station 2 | Station 3 | Station 4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | urbanest Westminster Bridge | urbanest Westminster Bridge, 203 Westminster B... | 1.5 | 227-458 | 25 | 9 | 11 | Westminster | Waterloo | Lambeth North | NaN |
| 1 | Lilian Knowles House | Lilian Knowles House, 50 Crispin Street, Londo... | 2.9 | 198-336 | 45 | 19 | 25 | Liverpool Street | Shoreditch High Street | NaN | NaN |
| 2 | College Hall | College Hall, University of London, Malet Stre... | 1.2 | 289-392 | 22 | 8 | 16 | Goodge Street | Euston Station | Russell Square | King's Cross/St Pancras |
| 3 | International Hall | International Hall, University of London, Lans... | 1.0 | 266-321 | 19 | 8 | 13 | Russell Square | Holborn | Euston | King's Cross/St Pancras |
| 4 | Butler's Wharf Residence | Butler's Wharf Residence, 11 Gainsford Street,... | 3.2 | 127-278 | 51 | 22 | 34 | London Bridge | Tower Hill | Bermondsey | NaN |
For further analysis, here we created three new columns to represent the upper bound, lower bound and average prices for "Price Range(£/week)". For the halls which don't have all 4 nearest stations, we left the non-existing data there as NaN.
split_column = df1['Price Range(£/week)'].str.split('-', expand=True)
split_column = split_column.astype(float)
df1[['Price Min', 'Price Max']] = split_column
average = (df1['Price Min'] + df1['Price Max']) / 2
df1['Average Price'] = average
df1.head()
| Name | Address | Distance to Campus(km) | Price Range(£/week) | On Foot(min) | By Bike(min) | By Public Transport(min) | Station 1 | Station 2 | Station 3 | Station 4 | Price Min | Price Max | Average Price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | urbanest Westminster Bridge | urbanest Westminster Bridge, 203 Westminster B... | 1.5 | 227-458 | 25 | 9 | 11 | Westminster | Waterloo | Lambeth North | NaN | 227.0 | 458.0 | 342.5 |
| 1 | Lilian Knowles House | Lilian Knowles House, 50 Crispin Street, Londo... | 2.9 | 198-336 | 45 | 19 | 25 | Liverpool Street | Shoreditch High Street | NaN | NaN | 198.0 | 336.0 | 267.0 |
| 2 | College Hall | College Hall, University of London, Malet Stre... | 1.2 | 289-392 | 22 | 8 | 16 | Goodge Street | Euston Station | Russell Square | King's Cross/St Pancras | 289.0 | 392.0 | 340.5 |
| 3 | International Hall | International Hall, University of London, Lans... | 1.0 | 266-321 | 19 | 8 | 13 | Russell Square | Holborn | Euston | King's Cross/St Pancras | 266.0 | 321.0 | 293.5 |
| 4 | Butler's Wharf Residence | Butler's Wharf Residence, 11 Gainsford Street,... | 3.2 | 127-278 | 51 | 22 | 34 | London Bridge | Tower Hill | Bermondsey | NaN | 127.0 | 278.0 | 202.5 |
df1.dtypes
Name object Address object Distance to Campus(km) float64 Price Range(£/week) object On Foot(min) int64 By Bike(min) int64 By Public Transport(min) int64 Station 1 object Station 2 object Station 3 object Station 4 object Price Min float64 Price Max float64 Average Price float64 dtype: object
3.2 Dataframe 2¶
Before forming dataframe 2, we dropped the unit symbols such as "£/week" and "m²", manually added some missing data in "Size Approximation(m²)" by referring to the original webstie and modified each type under "Room Type" and "Bathroom Type" to make sure they can be grouped. For example, some "Twin en suite room" were showed as "Twin en suite", so we unified the wording here.
file = 'data/contract_data.csv'
df2 = pd.read_csv(file)
df2.head()
| Name | Room Type | Bathroom Type | Price(£/week) | Size Approximation(m²) | |
|---|---|---|---|---|---|
| 0 | urbanest Westminster Bridge | Single room | Shared bathroom | 285.23-310.76 | 8.5 |
| 1 | urbanest Westminster Bridge | Single en suite room | Private bathroom | 286.90-346.90 | 13.4 |
| 2 | urbanest Westminster Bridge | Twin en suite room | Private bathroom | 227.09-240.22 | 25.3 |
| 3 | urbanest Westminster Bridge | Single studio | Private bathroom | 420.03-458.32 | 22.1 |
| 4 | Lilian Knowles House | Single en suite room | Private bathroom | 198.86-242.35 | 8 - 14 |
For further analysis, here we took the average for "Price(£/week)" and "Size Approximation(m²)" if it shows a range.
import numpy as np
def clean_price(price):
if '-' in price:
low, high = price.split('-')
return (float(low.strip()) + float(high.strip())) / 2
return float(price.strip())
def clean_size(size):
if ' - ' in size:
low, high = size.split(' - ')
return (float(low.strip()) + float(high.strip())) / 2
return float(size.strip())
df2['Size Approximation(m²)'] = df2['Size Approximation(m²)'].apply(clean_size)
df2['Price(£/week)'] = df2['Price(£/week)'].apply(clean_price)
decimal_places = 2
df2['Price(£/week)'] = df2['Price(£/week)'].round(decimal_places)
df2['Size Approximation(m²)'] = df2['Size Approximation(m²)'].round(decimal_places)
df2.head()
| Name | Room Type | Bathroom Type | Price(£/week) | Size Approximation(m²) | |
|---|---|---|---|---|---|
| 0 | urbanest Westminster Bridge | Single room | Shared bathroom | 298.00 | 8.5 |
| 1 | urbanest Westminster Bridge | Single en suite room | Private bathroom | 316.90 | 13.4 |
| 2 | urbanest Westminster Bridge | Twin en suite room | Private bathroom | 233.66 | 25.3 |
| 3 | urbanest Westminster Bridge | Single studio | Private bathroom | 439.17 | 22.1 |
| 4 | Lilian Knowles House | Single en suite room | Private bathroom | 220.60 | 11.0 |
df2.dtypes
Name object Room Type object Bathroom Type object Price(£/week) float64 Size Approximation(m²) float64 dtype: object
3.3 Dataframe 3¶
Dataframe 3 is a bit different from the first two dataframes. When starting off, we initially scraped the webpage under the facilities tab of each accomodation and a list is formed for each hall, containing all the information we need per hall. Here is an example of the scraping output:
Hyperlink: http://www.lse.ac.uk/student-life/accommodation/halls/urbanest-westminster-bridge/home.aspx
Room Types:
Bed spaces in total 669
Single 246
Single studio 36
Single en suite 331
Twin en suite (shared) 56
Facilities:
24-hour staff cover
Accessible rooms
Bicycle storage
Common room
Communal TV
Computer room
Lift access
Non-smoking
Printing facilities
Projector/Cinema room
Quiet study space
Secure entrance
Self-catered
Self-service laundry
WiFi
In order to make the informtaion usable for visualizatoin, we created two columns to split the kinds of rooms each hall had and the total number of bed spaces as they were currently all grouped together, as the layout enforced it on the webpage. Room types are given in other dataframes, so here we dropped room types and kept only the total number of bed spaces.
For the facilities, some halls had certain facilities and it was listed but if they didn't, it wouldn't be listed. We coded binary columns in the dataframe and assigned 1 if the hall has it and 0 if the hall doesn't have it and dropped the original facilities column.
The final dataframe is like follows:
file = 'data/accommodation_info.csv'
df3 = pd.read_csv(file)
df3.head()
| Hyperlink | Total Bed Spaces | Catered | Self-catered | Computer room | Self-service laundry | Printing facilities | Projector/Cinema room | Quiet study space | Lift access | Car parking | Communal TV | Common room | Non-smoking | 24-hour staff cover | Secure entrance | Accessible rooms | WiFi | Bicycle storage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | http://www.lse.ac.uk/student-life/accommodatio... | 669 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | http://www.lse.ac.uk/student-life/accommodatio... | 365 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
| 2 | http://www.lse.ac.uk/student-life/accommodatio... | 28 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 3 | http://www.lse.ac.uk/student-life/accommodatio... | 106 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 4 | http://www.lse.ac.uk/student-life/accommodatio... | 280 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
df3.dtypes
Hyperlink object Total Bed Spaces int64 Catered int64 Self-catered int64 Computer room int64 Self-service laundry int64 Printing facilities int64 Projector/Cinema room int64 Quiet study space int64 Lift access int64 Car parking int64 Communal TV int64 Common room int64 Non-smoking int64 24-hour staff cover int64 Secure entrance int64 Accessible rooms int64 WiFi int64 Bicycle storage int64 dtype: object
4. Data Analysis¶
4.1 Exploratory Data Analysis (EDA)¶
First let us gain some basic insight on LSE accommodations. There are 14 different accommodation halls provided by LSE, each offering a number of different room types to choose. Sidney Webb House and Passfield Hall have the most types of rooms, while Connaught Hall only have one type of room. If a incoming student is unsure what kind of room they would like to live in and are looking for multiple options, this would be a good indicator for that.
df2['Name'].nunique()
14
df2.groupby('Name')['Room Type'].count()
Name Bankside House 3 Butler's Wharf Residence 3 Carr-Saunders Hall 4 College Hall 3 Connaught Hall 1 High Holborn Residence 4 International Hall 3 Lilian Knowles House 3 Nutford House 2 Passfield Hall 5 Rosebery Hall 3 Sidney Webb House 5 The Garden Halls 4 urbanest Westminster Bridge 4 Name: Room Type, dtype: int64
The average time it takes to get to campus from a LSE provided hall is 30 mins on foot, 12 mins by bike and 19 mins by public transport. The average price of living at a LSE hall is 262.29 £/week. The price difference between different halls can be up to more than 100 £/week. This information can help those decide if they want to book their own accomodation themselves or go through the LSE housing process.
df1.describe()
| Distance to Campus(km) | On Foot(min) | By Bike(min) | By Public Transport(min) | Price Min | Price Max | Average Price | |
|---|---|---|---|---|---|---|---|
| count | 14.000000 | 14.000000 | 14.000000 | 14.000000 | 14.000000 | 14.000000 | 14.000000 |
| mean | 1.807143 | 29.928571 | 12.357143 | 19.357143 | 205.285714 | 319.285714 | 262.285714 |
| std | 0.840755 | 12.523604 | 5.637882 | 6.957090 | 53.282164 | 57.931563 | 48.643014 |
| min | 0.500000 | 11.000000 | 4.000000 | 10.000000 | 127.000000 | 259.000000 | 202.500000 |
| 25% | 1.350000 | 22.250000 | 8.250000 | 14.250000 | 169.250000 | 274.250000 | 217.750000 |
| 50% | 1.550000 | 26.000000 | 10.500000 | 17.500000 | 191.000000 | 309.000000 | 262.750000 |
| 75% | 2.500000 | 40.750000 | 17.500000 | 24.000000 | 256.250000 | 336.000000 | 289.750000 |
| max | 3.200000 | 51.000000 | 22.000000 | 34.000000 | 289.000000 | 458.000000 | 342.500000 |
With an average of 277 bed spaces each, all LSE accommodations are equipped with self-service laundry, common room, secure entrance and WiFi. All of them are also smoking prohibited and covered by 24-hour staff. But other facilities vary from hall to hall.
df3.describe()
| Total Bed Spaces | Catered | Self-catered | Computer room | Self-service laundry | Printing facilities | Projector/Cinema room | Quiet study space | Lift access | Car parking | Communal TV | Common room | Non-smoking | 24-hour staff cover | Secure entrance | Accessible rooms | WiFi | Bicycle storage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 14.00000 | 14.000000 | 14.000000 | 14.000000 | 14.0 | 14.000000 | 14.000000 | 14.000000 | 14.000000 | 14.000000 | 14.000000 | 14.0 | 14.0 | 14.0 | 14.0 | 14.000000 | 14.0 | 14.000000 |
| mean | 277.00000 | 0.642857 | 0.500000 | 0.642857 | 1.0 | 0.642857 | 0.285714 | 0.428571 | 0.928571 | 0.071429 | 0.928571 | 1.0 | 1.0 | 1.0 | 1.0 | 0.500000 | 1.0 | 0.928571 |
| std | 209.38665 | 0.497245 | 0.518875 | 0.497245 | 0.0 | 0.497245 | 0.468807 | 0.513553 | 0.267261 | 0.267261 | 0.267261 | 0.0 | 0.0 | 0.0 | 0.0 | 0.518875 | 0.0 | 0.267261 |
| min | 26.00000 | 0.000000 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.0 | 1.0 | 1.0 | 1.0 | 0.000000 | 1.0 | 0.000000 |
| 25% | 117.00000 | 0.000000 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 1.0 | 1.0 | 1.0 | 1.0 | 0.000000 | 1.0 | 1.000000 |
| 50% | 253.50000 | 1.000000 | 0.500000 | 1.000000 | 1.0 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 1.0 | 1.0 | 1.0 | 1.0 | 0.500000 | 1.0 | 1.000000 |
| 75% | 425.75000 | 1.000000 | 1.000000 | 1.000000 | 1.0 | 1.000000 | 0.750000 | 1.000000 | 1.000000 | 0.000000 | 1.000000 | 1.0 | 1.0 | 1.0 | 1.0 | 1.000000 | 1.0 | 1.000000 |
| max | 669.00000 | 1.000000 | 1.000000 | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.0 | 1.0 | 1.0 | 1.0 | 1.000000 | 1.0 | 1.000000 |
After gaining some basic understanding of our data, let us try to answer the questions we raised.
4.2 What is the relationship between the prices of each hall and their distance to campus?¶
From the line graph below, we can know that the average distance between halls and campus is 1.81km. High Holborn Residence is the closest to campus, while Butler's Wharf Residence is the furthest. The graph below can help eliminate or narrow down options if distance is a key factor for you.
import matplotlib.pyplot as plt
avg_distance = df1['Distance to Campus(km)'].mean()
plt.figure(figsize=(10, 6))
plt.plot(df1['Name'], df1['Distance to Campus(km)'], marker='o', linestyle='-')
plt.axhline(y=avg_distance, color='r', linestyle='--', label=f'Average Distance: {avg_distance:.2f}')
plt.title("Different Accommodations' Distance to Campus")
plt.xlabel('Accommodation Name')
plt.ylabel('Distance to Campus (km)')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
It is natrual to find that travel time by different modes of transportation is highly correlated with distance to campus. The change trend of travel time between different halls and campus is basically consistent with the waveform of distance to campus. From the bar chart we can see that cycling seems to be the most time-efficient way of traveling to campus rather than public transportation.
import plotly.graph_objects as go
import plotly.express as px
distance_data = df1[['Name', 'Distance to Campus(km)']].set_index('Name')
custom_colors = ['blue', 'green', 'red']
fig = px.bar(
df1,
x='Name',
y=['On Foot(min)', 'By Bike(min)', 'By Public Transport(min)'],
title='Travel Time to Campus by Foot, Bike, and Public Transport',
labels={'value': 'Time in Minutes', 'variable': 'Mode of Transport', 'Name': 'Halls'},
barmode='group',
color_discrete_sequence=custom_colors
)
fig.update_layout(
xaxis_title='Accommodation Name',
yaxis_title='Time in Minutes',
legend_title='Modes of Transportation',
xaxis_tickangle=-45
)
fig.add_trace(
go.Scatter(
x=distance_data.index,
y=distance_data['Distance to Campus(km)'],
mode='lines+markers',
name='Distance to Campus',
yaxis='y2'
)
)
fig.update_layout(
yaxis2=dict(
overlaying='y',
side='right'
)
)
fig.show()
However, it is interesting to see that the correlation between public transport travel time and distance is not as high as the other two modes of transportaion, indicating that as the distance increases, the travel time by public transport also increases, but with some variability. Factors such as route congestion, stops, and schedules may influence public transport travel time beyond a simple linear relationship with distance. Again, the above graph is able to showcase if you do choose a hall further than normal, here are the ways to get to campus as well as how.
distance = df1[['Distance to Campus(km)', 'On Foot(min)', 'By Bike(min)', 'By Public Transport(min)']]
correlation_matrix1 = distance.corr()
correlation_matrix1
| Distance to Campus(km) | On Foot(min) | By Bike(min) | By Public Transport(min) | |
|---|---|---|---|---|
| Distance to Campus(km) | 1.000000 | 0.995080 | 0.979606 | 0.863553 |
| On Foot(min) | 0.995080 | 1.000000 | 0.988529 | 0.888489 |
| By Bike(min) | 0.979606 | 0.988529 | 1.000000 | 0.916282 |
| By Public Transport(min) | 0.863553 | 0.888489 | 0.916282 | 1.000000 |
import seaborn as sns
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix1, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Between Distance and Travel Time')
Text(0.5, 1.0, 'Correlation Between Distance and Travel Time')
Now let us answer whether price of each hall is related to their distance to campus. We can see that distance is negatively correlated with all three aspects of prices. This means that as the distance to campus increases, the rent price tends to decrease moderately, but this is not a strong relationship given the coefficients.
Students seeking accommodation closer to campus may expect to pay higher rent prices compared to those living further away. This could be due to the convenience and proximity to academic buildings, amenities, and social activities on campus.
distance_price = df1[['Distance to Campus(km)', 'Price Min', 'Price Max', 'Average Price']]
correlation_matrix2 = distance_price.corr()
correlation_matrix2
| Distance to Campus(km) | Price Min | Price Max | Average Price | |
|---|---|---|---|---|
| Distance to Campus(km) | 1.000000 | -0.355154 | -0.234417 | -0.334103 |
| Price Min | -0.355154 | 1.000000 | 0.529610 | 0.863056 |
| Price Max | -0.234417 | 0.529610 | 1.000000 | 0.885537 |
| Average Price | -0.334103 | 0.863056 | 0.885537 | 1.000000 |
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix2, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Between Distance and Price')
Text(0.5, 1.0, 'Correlation Between Distance and Price')
From the scatter plot below we can see that most of the accommodations are located within 2km of the campus. There are many halls at about 1.5km from campus, but the price difference between is rather wide ranging from the highest to almost the lowest. Although we cannot simply jump into the conclusion that distance causes price to vary, we do find that all the accommodations that have an average price of 300 and above £/week are located near campus, and those that are more than 3km from campus have almost the lowest average price.
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df1, x='Distance to Campus(km)', y='Average Price')
plt.title('Relationship between Distance and Average Price')
plt.xlabel('Distance(km)')
plt.ylabel('Average Price(£/week)')
plt.grid(True)
plt.show()
We can create a connection graph to better represent each hall's distance to campus. For more information, we also attached the closest tube stations to each hall. This offers us insight on the geographical distribution of LSE accommodations in London and the degree of concentration of the halls to some extent.
import networkx as nx
accommodations = df1[['Name','Distance to Campus(km)','Station 1', 'Station 2', 'Station 3', 'Station 4']]
G = nx.Graph()
for index, row in accommodations.iterrows():
G.add_node(row['Name'], distance=row['Distance to Campus(km)'])
G.add_node("Campus")
for index, row in accommodations.iterrows():
G.add_edge(row['Name'], "Campus", distance=row['Distance to Campus(km)'])
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_size=2000, node_color='skyblue', font_size=10, font_weight='bold')
edge_labels = nx.get_edge_attributes(G, 'distance')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
plt.title('Connection Map of Accommodations to Campus')
plt.show()
LSE is located at the heart of central London, therefore most of the accommodations it provides are also located around several major transportation hubs such as King's Cross/St Pancras and Euston Station. This map is very helpful for students who want to travel in London during their spare time.
G = nx.Graph()
G.add_node('Campus', node_type='campus')
for index, row in accommodations.iterrows():
hotel_name = row['Name']
G.add_node(hotel_name, node_type='hotel')
for i in range(1, 5):
station = row[f'Station {i}']
if pd.notnull(station):
G.add_node(station, node_type='station')
for index, row in accommodations.iterrows():
hotel_name = row['Name']
G.add_edge('Campus', hotel_name)
for i in range(1, 5):
station = row[f'Station {i}']
if pd.notnull(station):
G.add_edge(hotel_name, station)
pos = nx.spring_layout(G, k=0.3)
node_colors = ['blue' if node_type == 'hotel' else 'green' if node_type == 'station' else 'red' for node_type in nx.get_node_attributes(G, 'node_type').values()]
plt.figure(figsize=(14, 10))
nx.draw(G, pos, node_color=node_colors, with_labels=True)
plt.title('Connection Map of Accommodations to Campus and Closest Stations Around Accommodations')
plt.show()
4.3 How do prices vary across different room types in each hall?¶
The average prices of each hall is a good representation and preview of how prices of each accommodation may differ from each other. The interactive bar graph is also color-coated to help identify which hall fits in the student's price range on a per weekly basis.
average_prices = df2.groupby('Name')['Price(£/week)'].mean().reset_index()
average_prices['Price(£/week)'] = average_prices['Price(£/week)'].round(2)
rainbow_colors = [
'#FF0000',
'#FF7F00',
'#FFFF00',
'#00FF00',
'#0000FF',
'#4B0082',
'#9400D3'
]
fig = px.bar(average_prices, x='Price(£/week)', y='Name',
title='Average Price by Accommodation',
labels={'Price(£/week)': 'Average Price (£/week)', 'Name': 'Accommodation Name'},
orientation='h',
color='Price(£/week)',
color_discrete_sequence=rainbow_colors)
fig.show()
On the other hand, the box and whiskers plot also gives important insight to how skewed the averages of each hall is. An example is that urbanest Westminster Bridge also has cheaper options despite being on the higher end of the scale. A student may really enjoy the location and convenience, but rule the hall out due to its high average. Passfield Hall is another example of how the averages do not paint the full picture as some of their prices rival other halls as well.
plt.figure(figsize=(14, 8))
sns.boxplot(x='Price(£/week)', y='Name', data=df2)
plt.title('Price Distribution by Accommodation')
plt.xlabel('Price (£/week)')
plt.ylabel('Accommodation Name')
plt.show()
Let us explore whether having a private or shared bathroom would influence the price of the room. Here we use a binary variable to represent Bathroom Type, where 0 refers to shared and 1 refers to private.
df2['Bathroom Type'] = df2['Bathroom Type'].replace({'Shared bathroom': 0, 'Private bathroom': 1})
df2.head()
| Name | Room Type | Bathroom Type | Price(£/week) | Size Approximation(m²) | |
|---|---|---|---|---|---|
| 0 | urbanest Westminster Bridge | Single room | 0 | 298.00 | 8.5 |
| 1 | urbanest Westminster Bridge | Single en suite room | 1 | 316.90 | 13.4 |
| 2 | urbanest Westminster Bridge | Twin en suite room | 1 | 233.66 | 25.3 |
| 3 | urbanest Westminster Bridge | Single studio | 1 | 439.17 | 22.1 |
| 4 | Lilian Knowles House | Single en suite room | 1 | 220.60 | 11.0 |
A positive correlation coefficient of 0.42 between Bathroom Type and Room Size suggests rooms with private bathrooms are larger in size, which is congenial with common sense.
The correlation between Bathroom Type and Price is 0.34, which indicates that rooms with private bathrooms tend to have higher prices compared to rooms with shared bathrooms.
There is a negative correlation between Room Size and Price, but -0.13 is rather weak. It indicates that there is a tendency for larger rooms to have slightly lower prices, although this relationship is not very strong.
type_price = df2[['Bathroom Type', 'Price(£/week)', 'Size Approximation(m²)']]
correlation_matrix3 = type_price.corr()
correlation_matrix3
| Bathroom Type | Price(£/week) | Size Approximation(m²) | |
|---|---|---|---|
| Bathroom Type | 1.000000 | 0.336833 | 0.417159 |
| Price(£/week) | 0.336833 | 1.000000 | -0.134955 |
| Size Approximation(m²) | 0.417159 | -0.134955 | 1.000000 |
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix3, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Between Bathroom Type, Size and Price')
Text(0.5, 1.0, 'Correlation Between Bathroom Type, Size and Price')
Let us look deeper into two pairs of relationships: Bathroom Type & Price and Price and Size.
This violin plot shows the relationship between bathroom type and price. Overall, the price range of private bathroom rooms is wider and generally higher than that of shared bathroom rooms. They are mainly concentrated at 300-400 £/week, while shared bathrooms rooms are mainly concentrated at 200-300 £/week.
plt.figure(figsize=(8, 6))
sns.violinplot(x='Bathroom Type', y='Price(£/week)', data=df2)
plt.title('Relationship between Bathroom Type and Price')
plt.xlabel('Bathroom Type')
plt.ylabel('Price (£/week)')
plt.xticks([0, 1], ['Shared', 'Private'])
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
We can fit an approximate line in this scatter plot to get the relationship between Price and Size. From the graph we can see that the line is downward-sloping, which represents a negative relationship. Interestingly, we notice that the specific point located in the up-left corner has the lowest price but largest room size, which is a bit strange.
x_column = 'Price(£/week)'
y_column = 'Size Approximation(m²)'
plt.figure(figsize=(8, 6))
sns.regplot(x=x_column, y=y_column, data=df2, scatter_kws={'s': 100}, color='skyblue')
plt.title(f'Correlation between {x_column} and {y_column}')
plt.xlabel(x_column)
plt.ylabel(y_column)
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
Typically, the larger the room, the higher the price should be. But through our analysis we get the exact opposite relationship. One possible reason might be the difference in room capacity. Room sizes differ across single, twin and triple rooms. Let us prove this assumption by assigning 0 to single rooms, 1 to twin rooms and 2 to triple rooms.
filtered_df = df2[df2['Room Type'].str.contains('single|twin|triple', case=False)]
filtered_df['Room Type'] = np.where(filtered_df['Room Type'].str.contains('single', case=False), 0,
np.where(filtered_df['Room Type'].str.contains('twin', case=False), 1,
np.where(filtered_df['Room Type'].str.contains('triple', case=False), 2, -1)))
filtered_df.head()
C:\Users\jenny\AppData\Local\Temp\ipykernel_18260\2723289544.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
| Name | Room Type | Bathroom Type | Price(£/week) | Size Approximation(m²) | |
|---|---|---|---|---|---|
| 0 | urbanest Westminster Bridge | 0 | 0 | 298.00 | 8.5 |
| 1 | urbanest Westminster Bridge | 0 | 1 | 316.90 | 13.4 |
| 2 | urbanest Westminster Bridge | 1 | 1 | 233.66 | 25.3 |
| 3 | urbanest Westminster Bridge | 0 | 1 | 439.17 | 22.1 |
| 4 | Lilian Knowles House | 0 | 1 | 220.60 | 11.0 |
The result presented by the correlation map matches our assumption. The strong positive correlation of 0.7 suggests a notable relationship between room type and size. As the room capacity increases, the size tends to increase as well.
The strong negative correlation of -0.79 indicates a significant relationship between room capacity and price. As the room type increases from single to triple, the price tends to decrease. This relationship is quite intuitive. Single rooms typically cost more than twin rooms, and twin rooms generally cost more than triple rooms. This pattern aligns with the idea that larger rooms like triples are often more cost-effective per person, as the cost is divided among more occupants.
roomtype_price = filtered_df[['Room Type', 'Price(£/week)', 'Size Approximation(m²)']]
correlation_matrix4 = roomtype_price.corr()
correlation_matrix4
| Room Type | Price(£/week) | Size Approximation(m²) | |
|---|---|---|---|
| Room Type | 1.000000 | -0.785060 | 0.702817 |
| Price(£/week) | -0.785060 | 1.000000 | -0.320083 |
| Size Approximation(m²) | 0.702817 | -0.320083 | 1.000000 |
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Between Room Type, Size and Price')
Text(0.5, 1.0, 'Correlation Between Room Type, Size and Price')
When looking at specific room types offered at each hall and making the horizontal comparison, most halls offer a similar weekly package. However, there are still outliers with significant price differences that can sway the decision making process. Here we provide comparions across halls for each specific room types. For room types that are unique to a singular hall, we decided that a graphical visualization would be too confusing so they are not presented here.
The most common room type being the single room has prices upwards of £317 per week at High Holborn Residence while other halls are offering £231 per week. These differences can be down to other factors in the hall such as distance to campus, facilities, and being catered or not. Looking at external components can also be a decisive factor when making the final decision if prices are too similar.
df2['Room Type'] = df2['Room Type'].str.lower().str.strip()
room_types = df2['Room Type'].unique()
plots = []
for room_type in room_types:
filtered_data = df2[df2['Room Type'] == room_type]
average_prices = filtered_data.groupby('Name')['Price(£/week)'].mean().reset_index()
if len(average_prices) >= 2:
fig = px.bar(average_prices, x='Price(£/week)', y='Name',
title=f'Price by Accommodation for {room_type}',
labels={'Price(£/week)': 'Average Price (£/week)', 'Name': 'Accommodation Name'},
orientation='h')
plots.append(fig)
for plot in plots:
plot.show()
If the student has a clear idea of what type of room they want to live in, they can refer to this price list and choose a specific hall according to the room type and prices.
df2['Room Type'] = df2['Room Type'].str.lower().str.strip()
grouped_df = df2.groupby(['Room Type', 'Name'])['Price(£/week)'].agg(list)
grouped_df
Room Type Name
3 bedroom flat Sidney Webb House [277.9]
double en suite room College Hall [392.63]
Sidney Webb House [290.85]
double room Butler's Wharf Residence [278.95]
double studio International Hall [321.93]
Sidney Webb House [336.7]
one-bed flat Lilian Knowles House [319.79]
queen studio Sidney Webb House [331.45]
single en suite room Bankside House [287.17]
College Hall [332.43]
High Holborn Residence [333.2]
Lilian Knowles House [220.6]
Passfield Hall [287.35]
Sidney Webb House [233.28]
The Garden Halls [358.61, 372.89, 314.16]
urbanest Westminster Bridge [316.9]
single room Bankside House [259.7]
Butler's Wharf Residence [231.0]
Carr-Saunders Hall [257.25]
College Hall [289.73]
Connaught Hall [273.63]
High Holborn Residence [317.8]
International Hall [266.28]
Nutford House [250.18]
Passfield Hall [252.88]
Rosebery Hall [255.85]
The Garden Halls [279.2]
urbanest Westminster Bridge [298.0]
single room with queen bed Carr-Saunders Hall [259.35]
High Holborn Residence [333.9]
single studio International Hall [298.13]
Lilian Knowles House [283.37]
urbanest Westminster Bridge [439.17]
triple room Passfield Hall [135.27]
twin en suite room Bankside House [176.92]
Carr-Saunders Hall [185.5]
High Holborn Residence [184.45]
Passfield Hall [182.7]
Rosebery Hall [185.5]
urbanest Westminster Bridge [233.66]
twin room Butler's Wharf Residence [140.52]
Carr-Saunders Hall [173.25]
Nutford House [183.4]
Passfield Hall [162.75]
Rosebery Hall [168.7]
Name: Price(£/week), dtype: object
4.4 Is the price worth what the hall provides in their facilities and catering system?¶
In this part we looked to see if catering has a impact on price. To answer this question, we need data from both df1 and df3.
columns_to_keep_df3 = ['Hyperlink', 'Total Bed Spaces', '24-hour staff cover', 'Non-smoking', 'Secure entrance', 'Self-catered', 'WiFi', 'Car parking', 'Common room', 'Bicycle storage', 'Lift access', 'Projector/Cinema room', 'Communal TV', 'Printing facilities', 'Quiet study space', 'Catered', 'Accessible rooms', 'Computer room', 'Self-service laundry']
columns_to_keep_df1 = ['Name', 'Price Range(£/week)']
df_3 = df3[columns_to_keep_df3]
df_1 = df1[columns_to_keep_df1]
combined_df = pd.concat([df_3, df_1], axis=1)
combined_df.head()
| Hyperlink | Total Bed Spaces | 24-hour staff cover | Non-smoking | Secure entrance | Self-catered | WiFi | Car parking | Common room | Bicycle storage | ... | Projector/Cinema room | Communal TV | Printing facilities | Quiet study space | Catered | Accessible rooms | Computer room | Self-service laundry | Name | Price Range(£/week) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | http://www.lse.ac.uk/student-life/accommodatio... | 669 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | ... | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | urbanest Westminster Bridge | 227-458 |
| 1 | http://www.lse.ac.uk/student-life/accommodatio... | 365 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | ... | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | Lilian Knowles House | 198-336 |
| 2 | http://www.lse.ac.uk/student-life/accommodatio... | 28 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | ... | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | College Hall | 289-392 |
| 3 | http://www.lse.ac.uk/student-life/accommodatio... | 106 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | ... | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | International Hall | 266-321 |
| 4 | http://www.lse.ac.uk/student-life/accommodatio... | 280 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | ... | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | Butler's Wharf Residence | 127-278 |
5 rows × 21 columns
From the descriptive statistics we can learn that most of the halls are catered, while non-catered halls have a wider price range.
data = combined_df[['Name', 'Price Range(£/week)', 'Catered']]
descriptive_stats = data.groupby('Catered')['Price Range(£/week)'].describe()
descriptive_stats
| count | unique | top | freq | |
|---|---|---|---|---|
| Catered | ||||
| 0 | 5 | 5 | 227-458 | 1 |
| 1 | 9 | 9 | 289-392 | 1 |
The boxplot below aims to showcase the average price difference between catered and non catered halls, and if it truly does make a difference in price.
We can see most catered halls tend to stay between the price range of 260-280 per week with no outliers. The non catered halls on the other hand range vastly in difference, however it seems the average non catered halls (shown by the green line) is cheaper than the average catered hall. The vast difference might mean that there are other factors in play that are creating the price to fluctuate so much; another predictor perhaps. The highest price accomodation, is in fact, not catered, so there does seem to be some other factors at play.
data['Price Range(£/week)'] = data['Price Range(£/week)'].str.replace('£', '').str.split('-', expand=True).astype(float).mean(axis=1)
plt.figure(figsize=(10, 6))
try:
data.boxplot(column='Price Range(£/week)', by='Catered', showfliers=False)
plt.title('Distribution of Price Range for Catered and Non-Catered Accommodation')
plt.xlabel('Catered (1: Yes, 0: No)')
plt.ylabel('Price Range (£/week)')
plt.xticks([1, 2], ['Catered', 'Non-Catered'])
plt.grid(True)
plt.show()
except TypeError as e:
print("Error:", e)
C:\Users\jenny\AppData\Local\Temp\ipykernel_18260\2226437400.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
<Figure size 1000x600 with 0 Axes>
We have calculated two values below; one is a point biserial correlation coefficient and the other is the p-value.
The point biserial correlation coefficient indicates the strength and direction of a relationship; in this case it is negative and extremely weak. The p-value indicates the probability of seeing a relationship as extreme. Because the p-value is extremeley high, it tells us the correlation coefficient is not statistically significant.
from scipy.stats import pointbiserialr
#also learned this in another stats class
data['Price Range(£/week)'] = data['Price Range(£/week)'].astype(str)
data['Price Range(£/week)'] = data['Price Range(£/week)'].str.replace('£', '').str.split('-', expand=True).astype(float).mean(axis=1)
correlation, p_value = pointbiserialr(data['Catered'], data['Price Range(£/week)'])
print("Point Biserial Correlation Coefficient:", correlation)
print("P-value:", p_value)
Point Biserial Correlation Coefficient: -0.11948773048658984 P-value: 0.6841092799016989
C:\Users\jenny\AppData\Local\Temp\ipykernel_18260\490508351.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\jenny\AppData\Local\Temp\ipykernel_18260\490508351.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
4.5 Other possible factors that might affect price¶
This is a bar chart to showcase the total number of beds available per hall to LSE students. This can be informative for those incoming if they are looking for a huge community or greater social life or maybe some seclusion. It could also inform you of competition for allocation; many available spots fill up quickly once the accomodation application is released.
plt.figure(figsize=(10, 6))
hall_names = [link.split('/')[-2].replace('-', ' ').title() for link in df3['Hyperlink']]
plt.bar(hall_names, df3['Total Bed Spaces'], color='skyblue')
plt.xlabel('Hall Name')
plt.ylabel('Total Bed Spaces')
plt.title('Total Bed Spaces of Accommodations')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
The scatter plot below shows total bed spaces and average price per hall. Just looking at the graph, we don't really see many clusters or patterns but we decided to calculate correlation anyways to see if there was any relationship between the amount of beds being offered in a building versus the price of one of those beds. A correlation of 0.03 indicates a very weak positive correlation. With such a low correlation, it is unlikely that total bed spaces alone can reliably predict average price. There must be other determinants.
from scipy.stats import pearsonr
#learned this in my other stats class but have cited some resources we used
data = combined_df[['Name', 'Total Bed Spaces', 'Price Range(£/week)']]
data[['Min Price', 'Max Price']] = data['Price Range(£/week)'].str.replace('£', '').str.split('-', expand=True).astype(float)
data['Average Price'] = (data['Min Price'] + data['Max Price']) / 2
correlation_coefficient, _ = pearsonr(data['Total Bed Spaces'], data['Average Price'])
print("Correlation coefficient:", correlation_coefficient)
plt.figure(figsize=(10, 6))
plt.scatter(data['Total Bed Spaces'], data['Average Price'], cmap='viridis', alpha=0.5)
plt.title('Total Bed Spaces vs Average Price')
plt.xlabel('Total Bed Spaces')
plt.ylabel('Average Price (£/week)')
plt.grid(True)
plt.text(0.1, 0.9, f'Correlation: {correlation_coefficient:.2f}', transform=plt.gca().transAxes, fontsize=12, verticalalignment='top')
plt.show()
C:\Users\jenny\AppData\Local\Temp\ipykernel_18260\976053329.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\jenny\AppData\Local\Temp\ipykernel_18260\976053329.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\jenny\AppData\Local\Temp\ipykernel_18260\976053329.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\jenny\AppData\Local\Temp\ipykernel_18260\976053329.py:10: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
Correlation coefficient: 0.03112359766149185
5. Conclusion¶
Choosing an accommodation is a problem that involves many considerations. This project aims to streamline the accomodation search process and help individuals easily compare halls to determine what is the best fit for them.
Our first question we try to answer is to see if there is any correlation or relationship between price and the distance to campus. Through analysis and data visualization, we are able to determine that there isn't a set pattern. But the more expensive accommodations tend to be closer to campus.
The second question we posed was the varying prices across different room types. We found that rooms with private bathrooms tend to be on the higher price range with a positive correlation coefficient. Interestingly enough, we found a negative relationship between higher prices and larger room size. We believe that larger rooms, those that are shared, are cheaper and because shared, tend to be bigger. We were able to create a room price list for people to easily look at if they have a budget in mind and some idea if they are looking to live in a single, double, triple, shared bathroom, etc.
In our third question, we try to figure out if the price is worth the facilities offered. Most accommodations have the basics like wifi, common room, security, but a huge variance is whether it is catered or not. The average catered hall is more expensive than the average non catered hall, however the most expensive accommodation has no catering. This leads us to believe there is another predictor increasing that price.
Lastly, we explore some other possible factors that may affect price. The code can be applied to any variable but we look through number of beds in an accommodation building and compare that to average price. The correlation coefficient is very weak leading us to believe this predictor doesn't have much effect and it must be something else, or a combination of others. People looking to compare halls are easily able to do so through our visualizations and analysis.
In conclusion, students can weigh the various factors that affect the price of accommodations according to their needs and makes tradeoffs in between.
However, we do understand that our analysis has some limitations and can be expanded on in certain areas...
- Being able to access internal data, whether that be satisfaction surveys or booking rates per year can be extremely informational and provide more indepth analysis flexibility.
- We also could have explored and tested more variables; such as grocery stores nearby or hospitals. Some qualitative things could have been taken into consideration like reputation of neighborhood.
- Our analysis is based on prices currently on the website for the year of 2023-2024. We don't have access to past prices or if the price might go up. A logitudinal analysis could be helpful to account for differentiations in university policies, economic changes, or seasonal bookings.
While our analysis sheds light on key factors, its imperative to recognize its limitations and opportunities for further research. Ultimately continual refinement and expansion of our analysis will empower students with the information they need to navigate the accommodation search process effectively.
6. Reference¶
LSE Accommodation Website: https://www.lse.ac.uk/student-life/accommodation/search-accommodation
Additional code we used for scraping and creating dataframes: Scraping Code and Creating Dataframes
Reference for scipy.stats is a different stats class but also: https://www.statology.org/point-biserial-correlation-python/